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1. Introduction 

Consider the one-dimensional diffusion model 

dXt^b{Xt)dt + dWt, t>0, (1.1) 

where is a standard Brownian motion and b is an unknown drift function belonging 
to a class of functions 3§. We will make assumptions on b, stated precisely in the next 
section, ensuring that (1.1) has a unique stationary solution X. The aim is to make 
inference about b on the basis of discrete-time observations Xq,X^, . . . ,XnA, for some 
fixed sampling frequency 1/A. 

Under appropriate conditions the solution to (1.1) is a positively recurrent, ergodic 
Markov process with a unique invariant probability distribution. Moreover, under mild 
regularity conditions the process has transition densities pii{t,x,y) relative to Lebcsgue 
measure. In this case, we can employ a Bayes procedure for making inference about the 
drift function b. This involves putting a prior distribution IT on the set of drift functions 
and computing the posterior II{-\Xq, X^, . . . , XnA)- If the initial distribution is the 
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invariant probability measure with density TTf,, the posterior measure of a measurable set 
B C S§ is given by 



(We assume of course the necessary measurability to ensure that this is well defined.) 

This immediately reveals a practical complication, since the transition densities of 
a diffusion process can typically not be computed explicitly. Several approaches have 
been proposed in the literature to circumvent this problem. These include for instance 
simulation-based methods for approximating the transition densities, or Y. Ai't-Sahalia's 
closed-form expansions, cf. for example, Jensen and Poulsen [17] for an overview. A 
method that has been proven to be particularly useful for dealing with Bayes procedures 
is to view the continuous segments of the diffusion process between the observations as 
missing data and to employ a Gibbs sampling scheme. Practically this involves repeatedly 
simulating diffusion bridges to generate the missing data and drawing from the poste- 
rior distribution of b given the augmented, continuous data {Xt: t € [0,nA]). Several 
schemes have been devised to simulate the diffusion bridges, see, for example, Elerian 
et al. [8], Eraker [9], Roberts and Stramer [20], Beskos et al. [2], Golightly and Wilkin- 
son [13] and Chib et al. [5]. Drawing from the continuous-data posterior can be done 
by more conventional methods, because contrary to the discrete-observations likelihood, 
the continuous-data likelihood has a known closed form expression given by Girsanov's 
theorem. 

For parametric models, where the drift function is known up to a Euclidean param- 
eter that has to be estimated, the outlined approach has been shown to provide an 
effective method for dealing with discretely observed diffusions. The approach is however 
not essentially limited to a parametric setup. The methodology has great potential to 
be developed into a practically feasible methodology in nonparametric settings as well. 
It is however very well known that in Bayesian nonparametrics the choice of the prior 
distribution is crucial and posterior consistency is not automatically guaranteed (e.g., Di- 
aconis and Freedman [7]). This motivates the study of posterior consistency for discretely 
observed diffusions carried out in this paper. 

In the i.i.d. -setting, sufficient conditions for posterior consistency were first obtained 
by Schwartz [22]. See also Barron et al. [1], Ghosal and van der Vaart [10] and Shen 
and Wasserman [23]. Here we consider discrete observations from a diffusion model (1.1), 
which constitute a Markov chain. A number of recent papers have investigated the prob- 
lem of posterior consistency or convergence rates for Markov data, cf. for example, Ghosal 
and van der Vaart [12], Ghosal and Tang [11], Tang and Ghosal [24]. The results in these 
papers do however not immediately lead to practically useful results for our setting. The 
problem lies again in the fact that in our case, the transition densities of the model 
are typically not analytically tractable. Since the conditions for consistency given for 
instance by Tang and Ghosal [24] involve the transition densities, they cannot be readily 
used to verify consistency for a given prior in our discretely observed diffusion model. 
The aim in this paper is to formulate conditions involving only the coefficients appearing 
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in the stochastic differential equation (1.1). We achieve this by adapting the results of 
Tang and Ghosal [24] to the present setting. Basically, we need two assumptions. Firstly, 
if /io denotes the true invariant probability measure of the process X, we require that 
the prior puts positive mass on balls {b € \\b— II 2,^0 < e} for each e > (|| • ||2,^o 
denotes a weighted L^-norm and &o denotes the true drift). This is a natural condition, 
since if the prior excludes the true drift, consistency can never be obtained. Secondly, we 
need an equicontinuity assumption (Definition 3.4), which limits the size, or rather the 
complexity, of the set of drift functions. Under these assumptions, we obtain posterior 
consistency (Theorem 3.5): the posterior measure of appropriately defined weak neigh- 
borhoods of the true drift function 60 converges to 1 almost surely, as the number of 
observations n tends to infinity. This is the main result of the paper. 

Ghosal and van der Vaart [12] give conditions from which the posterior rate of conver- 
gence for Markov chain data can be calculated. These conditions are a combination of 
a prior mass condition and a testing condition. This testing condition requires that one 
can test the true drift function against balls of alternatives with exponentially decaying 
error probabilities. Such tests are not easily constructed in the present setup. Appropri- 
ate tests for Markov chains have been shown to exist under certain (lower) bounds on 
the transition probabilities (e.g., [3]). In our setup such bounds do however not seem to 
be valid in general. An interesting line of future research would be to extend or adapt the 
available testing results for Markov chains to the setting of discretely observed diffusions. 
This may not only give posterior consistency results in a stronger topology, but may pave 
the way for obtaining posterior rates of convergence as well. In the present paper, we 
completely avoid the construction of tests. Instead we employ martingale arguments in 
a similar fashion as Tang and Ghosal [24], who adapted the approach of Walker [25] to 
the Markov chain setting. 

The remainder of the paper is organized as follows. In Section 2, preliminaries on the 
statistical model and Bayes procedure are outlined. The main consistency result of this 
paper is formulated in Section 3. Examples of priors that satisfy the requirements for 
consistency are given in Section 4. The paper ends with a proof of the main result and 
some concluding remarks. The Appendix contains a technical lemma. 

1.1. Notation 

llsllp,'^ = (/ \g\Pdvy^^: Lp-norm relative to the measure v. 

L'^ifi): space of square integrable functions with respect to measure fi. 

C{A), BC{A): space of continuous functions, space of bounded continuous functions 

defined on A C M. 

C^(M), for s e (0, 1): space of s-H61dcr functions, that is. 




(1.2) 



^{X): law of a random variable X. 
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Pj^: law that the solution of the SDE (1.1), with ^{Xq) = fi, generated on the eanonical 

path space C(R+). 

Ht- invariant measure. 

Pfc: short-hand notation for Pj|^^. 

P^: short- hand notation for P^ , where 6x denotes Dirac measure at a; € R. 
/xq: short-hand notation for fXhg. 
■Kb'- density of invariant measure. 

{P^)t>o: transition semigroup associated with the diffusion. 
Pb{t,x,y): transition density. 



2. Setup 

2.1. Description of the diffusion model 

In this section, we give a precise description of the diffusion model that we consider. Let 
^ C C(K) be a collection of continuous functions on M. For b G ^ and a fixed number 
c G R, let the function Sf, : M — M be defined by 

Sb{x) = J exp ^—2 J b{z)dz^dy. 

We assume that 

lim Sb{x) = — oo, lim Sb{x) — oo 

for all b € S§. The finiteness (or nonfiniteness) of these limits does not depend on the 
choice of c (see page 339 in Karatzas and Shreve [19]). It is classical that under these 
assumptions, we have that for every x e M and b £ the SDE 

dXt^b{Xt)dt + dWt, Xo = x, 

has a unique weak solution. Let P^ denote the law that this solution generates on the 
canonical path space C(]R+). Then in the commonly used terminology of Ito and McKean 
[16] or Kallenberg [18], Chapter 23, the collection of laws (Pj.: x G R) constitutes a 
canonical, recurrent diffusion on the real line. In other words, for X the canonical process 
on = C(R-i-) defined by Xt{uj) ~ uj{t) we have the following: 

(i) Under P^ the process X starts in x, that is, P'^{Xo = a;) = 1 for all a; G R. 

(ii) The process X is strong Markov. 

(iii) For all a; G R, the process X is recurrent under P^. 

For a probability measure fj. on R we define, as usual, P''^{B) = /P^(i?)/x(da;) for a 
measurable set B. Then under F'' the law of Xq equals and X is the weak solution of 



dXt = b{Xt)dt + dWt, ^(Xo)-M- 
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As the notation suggests, Sb is the scale function of the diffusion. The speed measure 
is denoted by nib. In the present setting, it is the Borel measure on R given by 



mb(da;) = cxp ^2 J b{z)dz^dx. 



(2.1) 



We assume that the speed measure is finite, that is, mb(R) < oo. This ensures that the 
diffusion is positively recurrent and ergodic in the sense that for all x G K., 

Xt =^ ^ib (2.2) 

as i — 7> oo, where ^b = rnb/mb{M.) is the normalized speed measure (cf., e.g., Kallenberg 
[18], Theorem 23.15). We will write = Mfeo- T^^^ measure fj,b is the unique invariant 
probability measure of the diffusion. In particular, the process X is stationary under 



. It is easily verified that under our conditions, Hb has a continuously differentiable 



lb 
"lib 

Lebesgue density TTf,. Moreover, it follows from (2.1) that we have the relation 

We denote the transition semigroup associated to the diffusion by {Pf)t>o. In other 
words, for a bounded measurable function / on R and a; £ M we have P^f{x) = E^/(Xj), 
where is the expectation associated to P^. The operator Pj^ maps the space BC{R) of 
bounded, continuous functions on R into itself (see, e.g., (the proof of) Theorem 23.13 of 
Kallenberg [18], or Rogers and Williams [21], Proposition V.50.1). A regular diffusion as 
we are considering is known to have positive transition densities with respect to its speed 
measure, cf., for example, Ito and McKean [16], Section 4.11. Since the speed measure 
has a positive Lebesgue density under our assumptions, we have in fact the existence of 
transition densities pb'- (0,oo) x R x R ^ (0,oo) such that for all bounded, measurable 
functions /, a: G R and t > 0, 

P^f{x)= f pbit,x,y)fiy)dy. 



For more background on the theory of one-dimensional diffusions and relevant refer- 
ences to the literature, see, for instance, Borodin and Salminen [4]. 



2.2. Statistical model and Bayes procedure 

Consider the setting described in the preceding section, that is, we have a collection 
^ C C (M) such that every b G ^ determines an SDE that generates an ergodic diffusion 
on R. For b G 3§, let be defined by Pb = P|^^ . In other words, under Pft the canonical 
process X on C(R+) is the unique stationary solution of the SDE 



dXt=b{Xt)dt + dWt. 
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We assume that for some fixed A > and a natural number n, we have n + 1 observations 
Xo,Xa, ■ ■ ■ ,XnA from X under Pb^, for some "true" drift function bo £ SS. The aim is 
to infer the drift function 60 from these data. 

In our Bayesian approach, we assume that the model is a measurable subset of C(M) 
and we put a prior distribution 11 on it. Next, we consider the posterior distribution 
n(-|Xo, . . . ,^nA) on SS, which is given by 

^ ^ _ /g^6(^0)n:LlPb(A,X(.-l)A,^.A)n(d6) 

In the next section, wc provide sufficient conditions under which the posterior asymptot- 
ically concentrates its mass around the true drift function 69 as n 00 . 



3. Consistency 

We arc interested in conditions under which the posterior asymptotically concentrates 
its mass around the true drift function 60 ■ More precisely, we want that under the 
posterior mass concentrates on arbitrarily small neighborhoods of 60. To ensure that 
neighborhoods of points b^bo do not receive posterior mass in the limit, the topology 
we use to define the neighborhoods should have some separation properties, it should for 
instance be Hausdorff. 

We define a weak topology on ^ through the transition operators (see Section 2). 
This is justified by the following lemma, which states that identifying the drift parameter 
b is in our setting equivalent to identifying P^. 

Lemma 3.1. If Pj^ = P^' for some t>0, then b = b'. 

Proof. Fix an a; e M and b£ By the semigroup property, the law of Xnt under P^ is 
determined by P^ . Indeed, for / a bounded measurable function and n a natural number 
we have 

On the other hand, ergodicity implies that the law of Xnt under P^ converges weakly to 
the invariant distribution /Ltf,, cf. (2.2). It follows that Pi completely determines fXb- By 
(2.3), /ift completely determines b under our assumptions. □ 

Now let V he a, finite Borel measure on the state space K. For b£ / G BC{M.) and 
£ > 0, let 

f7},, = {6'e.^: !lPi'./-Pi./!li,.<£}. 

Consider the topology on 3§ that is determined by the requirement that for b € the 
collection of sets 

/eBC(R),e>0} 
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forms a sub-base for the neighborhood systera at b. By definition, this means that any 
open neighborhood of 6 € ^ is a union of finite intersections of the form [/^ ^ n • • • n 

Although the topology is defined in a rather indirect fashion, it has the desired Haus- 
dorfF property, that is, different points in ^ can be separated by disjoint open sets. 

Lemma 3.2. Ifv assigns positive mass to all nonempty open intervals, then the topology 
on is Hausdorff. 

Proof. Consider two functions b^b' in S^. By Lemma 3.1, we have P\ ^ and hence 
there exists an / e BC(R.) and an x £ R such that P^f{x) ^ P^f(x). By continuity 
there exists in fact a nonempty open interval J C M where the functions P^f and P^f 
are different. By the assumption on ly, it follows that for some e > 0, 

ll^i/-^A ./II !,.>£• 

This implies that the neighborhoods C^)g/2 and disjoint. □ 

An alternative point of view on the topology that we use is obtained by considering 
the high-frequency limit A — >■ 0. Let At, be the generator of X under P^, that is, A^f = 
bf + f"/2 for a C^-function /. Then for smaU A, 

P'^f -P'if- HAbJ - A, J) = A(fei - b2)f'. 

It follows that for small A, the constructed topology is close to the topology induced by 
the L^(i')-norm on the set of drift functions 3§. 

Having specified the topology, we can define weak posterior consistency, or just con- 
sistency. 

Definition 3.3. We have weak posterior consistency if for every open neighborhood Ubg 
of bo, it holds that 

n(6^C/,jXo,XA,...,X„A)^0 Fb„-a.s. 

as n— > cxD. Note that the word "weak" refers to the topology, not to the mode of stochastic 
convergence. 

Theorem 3.5 below is the main result of this section. It needs the following definition. 

Definition 3.4- We call a collection ^ of real-valued functions on the real line locally 
uniformly equicontinuous if for every e > and every compact K C M, there exists a 
5 > such that 

sup sup \f{x) - f{y)\<£. 

\x — y\ <(5 
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In Section 4 we give examples of locally uniformly equicontinuous collections of func- 
tions. 

Theorem 3.5. Suppose we have discrete-time data from the stationary solution to the 
stochastic differential equation 

dXt = b{Xt)dt + dWt, t>0. 

Denote the invariant measure of the diffusion with drift bo by /iq- Let II be a prior on 
the set of drift functions and suppose that 3§ is locally uniformly equicontinuous and 
supj,g<g ||6||oo < oo. Then if 

n(6ei^: ||5-6o||2,;.„ <£) >0 foralle>0, (3.1) 

we have weak consistency (as in Definition 3.3). 

In Bayesian practice, a model set ^ is typically not specified explicitly. Usually some 
prior n is simply chosen and the procedure is carried out. From this perspective, the 
theorem states that if the chosen prior gives mass 1 to a set of functions that is uniformly 
bounded and locally uniformly equicontinuous, then we have weak consistency for every 
true bf) in the L^(/xo)-support of the prior. 

Prior mass conditions like (3.1) are standard in results on posterior consistency. Intu- 
itively, it is reasonable that if we want the posterior to concentrate around b^ asymp- 
totically, the prior should put sufficient mass near 6o too. The uniform boundedness and 
equicontinuity conditions limit the size of the support of the prior, which is reasonable 
as well. The conditions arc somewhat restrictive, but due to technical reasons cannot 
be avoided in our approach. In settings where consistency can be derived using testing 
arguments, boundedness and equicontinuity conditions can typically be relaxed, and only 
need to be valid on certain subsets ^„ of the support 3S of the prior with increasing prior 
probability. However, since we do not have the appropriate tests available in this case, 
we cannot follow such an approach unfortunately. On the other hand, computational 
approaches like the one of Beskos et al. [2] require in fact that both b and its derivative b' 
are uniformly bounded, which is more restrictive than the conditions of our consistency 
theorem. 

The proof of the theorem is deferred to Section 5. In the next section, we first consider 
a number of concrete priors for which the assumptions of the theorem are verified. 

4. Examples of concrete priors 

The following example is perhaps of little practical relevance, but it shows already that 
there is an abundance of priors available that yield posterior consistency. 

Example 4-1 (Discrete net priors). Let the collection of drift functions ^ satisfy 
the requirements of Theorem 3.5. That is, ^ is locally uniformly equicontinuous and 
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suPbg^ ll^lloo < OO- To construct the prior choose two probabihty distributions (p„) and 
on the positive integers such that Pn,qn > for n large enough, and a decreasing 
sequence of positive numbers e„ 1 0. For m > 1, let = {b\[-m,m]- b € S§} be the set of 
restrictions of functions in to the interval [— TO,m]. The functions in SSm are uniformly 
equicontinuous and hence, by the Arzela-Ascoli theorem, SSra is totally bounded for the 
uniform norm. For every n, we fix a finite e„-net ^rn,e„ for ^m, that is, ^,ri,E„ is a 
finite set such that every element of SSm is within uniform distance of some element 
of SSm,en ■ We extend every function in the net to the whole real line by setting it equal 
to 1 on (— OD, — m — 1] and to —1 on [m + 1, oo), and interpolating linearly in the intervals 
m — 1, —to] and [to, to + 1]. A draw h from the prior 11 is now generated as follows: 

(i) draw m from the probability distribution (p,„), 

(ii) draw n from the probability distribution (f^n), 

(iii) draw b uniformly from ^m,e„ • 

In other words, if ^m,e„ ~ {^1"'": ■ • • , }j then 

m—1 n—1 k—1 

By construction, 11 assigns mass 1 to a countable set of functions that is uniformly 
bounded and locally uniformly continuous. Now consider bo G S§ and e > 0. We show 
that condition (3.1) is satisfied. For every b^ S§ and m G N, we have 

||6-6o!||^„= / {b{x)-bo{x)fdf,o{x)+f {b{x)-bo{x)fdMx) 

J \ x\<7n J \ x\>7n 

<\\b- &o|M,oo + 2 / (bHx) + blix)) dMx) 

'J |a:| >m 

< \\b-bQ\\l,^^+C^loi\x\>m), 

where \\ ■ \\m.oo denotes the uniform norm on [—to, to] and C = 2(1 +supf,g^ II^IlD- Hence, 
for m G N so large that C/io(|a;| > m) < e^, it holds that 

n(6: \\b-bo\\i,„<2e')>Il{b: \\b - bo\\l,^ < e'). 
For nGN such that e„ < e and g„ > 0, we have, by construction, 

n(6: \\b-bo\\l,^<e^)>nib: ||6 - 6o||,„.oo < £n) > > 0. 

This shows that condition (3.1) holds, and hence we have posterior consistency for this 
class of priors. 

If C ^(M) for some s e (0,1) and sup6g^||&||s < oo (sec (1.2)), then clearly £g 
satisfies the equicontinuity condition of Definition 3.4. In the following example, we use 
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wavelet expansions to construct a consistent prior on drift functions which belong to such 
a class of Holder functions. 

Example 4-2 (Wavelets). Suppose {(pk,ipj,k}k£Z,j>o is an orthonormal wavelet basis, 
so that functions f E (M) can be represented as 

fcez feeZi>o 

(the convergence being in L^(M)). The functions ipj^k are obtained from the mother 
wavelet function by translation and scaling: ipj^k{') =2^^'^i{j{2^ ■ —k). Similarly, the (pk 
are obtained from the father wavelet Lp (also called scaling function) by translation: 
'Pk{-)=p}{--k). 

It is well known that under appropriate smoothness conditions on "0, the rate of decay 
of the wavelet coefficients characterizes the smoothness of the function /. Assume ij) is 
continuously differentiable and has compact support. Then / G n L^(R) if and only if 

ll/llcx) < oo and 

• \{f,^k)\<Cf for all A: eZ, 

• \{f,i^j,k)\ < (7/2-^(^+1/2) fQj. j > and A: e Z. 

Moreover, C/ can be taken as the product of the Holder norm of / and a constant (that 
does not depend of /). For a proof, wc refer to Section 6.7 in Hernandez and Weiss [15], 
see also Daubechies [6], Section 9.2. This characterization implies that for s G (0, 1) and 
L> 0, the collection 

■^s,L := | / e L2(M): / = ^afc^fc + ^i-^^i^^, 

^ kel' fcGZj>0 

sup|afc| + supsup2-'("+i/2)|^^. ^1 < ^ I 

k j k 'J 

consists of s-H61der continuous functions with uniformly bounded Holder norms. 

In addition to the smoothness condition on ip, we assume that the scaling function ip is 
bounded and compactly supported. This implies that the function 6^{x) = J2k Ifi^ ^ ^")l 
is such that ess snp^0;p{x) < oo. This is a localization condition that is referred to as 
Condition (6) in Hardle et al. [14] (page 77). By inequalities (9.34) and (9.35) on page 
114 in Hardle et al. [14], Condition {0) implies that the supremum norm of X]/c ^/^Vo,/: 
is equivalent to the sup- norm on the sequence {ak}k- In addition, the supremum norm 
of J2j>o^k^3,k''Pj,k is equivalent to the || • ||^-norm of the doubly indexed sequence 
b^{bj,k}j>o,k, where 

\\b\\^ = J2'2'^'sup\b,^k\. 

h 

It follows in particular that the uniform norm of the functions in .^s.L is uniformly 
bounded. 
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To construct a prior on drift functions that is consistent for all true drift functions 
bo G ^s,L we first construct an auxiliary prior IT' on the whole class ^s,l (which does 
not only charge drift functions of ergodic diffusions). Lot J be a discrete random variable, 
supported on No = {0, 1, . . .} and let Uj^k,Vk, for j G No, fc G Z, be independent random 
variables, independent of J, from a distribution with a strictly positive, continuous den- 
sity on its support [— L, L]. Define the prior IT' as the law of the random function 



k£Z 



j=o kez 



on R, where rjj = 2-J('*+i/2). To arrive at a prior on drift functions of ergodic diffusions 
wc proceed as in the preceding example. Wc choose a probability distribution (pm) on 
N, with Pm > for all m. A draw from the final prior IT is then constructed as follows: 

(i) Draw m from the probability distribution (pm)- 

(ii) Independently of m, draw a random function from 11' and restrict it to [— m,m]. 

(iii) Extend the function to the whole real line by setting it equal to 1 on (— oo, — m — 1] 
and to —1 on [m + 1, oo), and interpolating linearly in the intervals [—m — 1, —to] 
and [to, TO + 1]. 

By construction, IT assigns mass 1 to a set of drift functions satisfying the cquicontinu- 
ity and uniform boundedness conditions of Theorem 3.5. To prove that this prior yields 
consistency for bo G ^s.l it remains to show that (3.1) holds. Let £ > be fixed. Then 
exactly as in the preceding example, there exists an to G N such that 

n(fo: ||fe-feo|lL.o <2£')>n(6: ||6 - 6o||,»,oo < e). 
Since the right-hand side is further bounded from below by 



n'(6: ||6-6o 



I m , OO ^ ) ^ ^ Pn : 

n>m 

m,oo < e) > 0. 

To see that this is true, let a" and b^ ^ be the wavelet coefficients of the true drift 



it now suffices to show that !!'(&: |16 — 6o 

To see that this 
function 6o and let 



k j=0 k 

be distributed according to W . Then 



\B~bo\\m.oc.< 



^k 



3=0 k 



j>j k 



(4.1) 
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The first term on the right is bounded by 

where Km is a finite set of natural numbers, since f is compactly supported. 

Since \a1\ < L, the Vk have full support in [— L,L] and is finite, this quantity is 
bounded by e/3 with positive probability. By the equivalence of norms mentioned above 
and the definition of J^s.l, there exists a constant c > such that the third term on the 
right of (4.1) is bounded by 

2^"/2max|6°fc| < 2^/2^2-^(^+1/2) < cL2-''. 
j>J j>J 

Hence, if we choose Jq such that cL2~'^''^ < e/3, then the third term on the right of 
(4.1) is bounded by e/3 with probability at least P( J = Jq) > 0. On the event {J = Jq}, 
the second term on the right-hand side of (4.1) is bounded by a constant times 

Jo 

J2 2^/' max h.C/,, - < Jo2'^«/2^.^ max^^ \ri,U,^k - ^%|. 

j=o " ' " 

The set K',^ is finite, since ip is compactly supported. Since < ijjL and the Uj^s 
have full support in [—L,L], the right-hand side of this display is less than e/3 as well 
with positive probability. Combining the considerations above and using the fact that J, 
the Vk and the Uj^k are all independent, we conclude that II' {b: \\b— &o||m,oo < e) > 0. 



5. Proof of Theorem 3.5 

Recall that under Pf,, the observations Xo,Xa, ■ ■ ■ form a discrete-time Markov chain with 
positive, continuous transition densities pb{A,x,y) and a positive, continuous invariant 
density tti,. For b& we consider the associated Kullback-Leibler divergence 

KL(6o,6)= / / Pb„ (A, X, y) log ^""/^^ "^'^ 7r,„ (x) dx dy. 
J J Pb{^,x,y) 

The following lemma shows that condition (3.1) of Theorem 3.5 implies that we have the 
Kullback-Leibler property relative to this distance measure. 

Lemma 5.1. Condition (3.1) of Theorem 3.5 implies that for every e > 0, we have 
n(6: KL(&o,fo) <e) >0. 

Proof. To prove the lemma we bound the quantity KL(5o,5) from above by a multiple 
of ll&o — 6II2 ^0 • For convenience we introduce the notation K{P, Q) ~ Ep log dP/dQ for 
the Kullback-Leibler divergence between two probability measures P and Q on the same 
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(T-field. The law of a random element Z under the underlying probability measure Q is 
denoted by ^{Z\Q). 

Under Ft, for every b<^ the pair {Xq,X\) has joint density [x, y) ^ nb{x)ph{A, x, y) 
relative to Lebesgue measure. Hence, the Kullback-Leibler divergence between ^{{Xq, 
Xa)|P6„) and ^((Xo,Xa)|P6) equals 

■^bo {x)pba (A, X, y) log ^''»(^)^''o^^'^'^) da; dy = KL(6o , b) + K{nbo , /^f-)- 
nb(x)pb[A,x,y) 

Now {Xo,Xa) is a measurable functional of the continuous path {Xt: t € [0,A]). 
Hence, the Kullback-Leibler divergence between ^((Xq, XA)|Pbo) ^^'^ ■^{{Xo,XA)\F'b) 
is bounded by the Kullback-Leiblcr divergence between the laws ^{{Xt: t € [0, A])|Pho) 
and .if((Xt: te [0,A])|P6) of the full path {Xt: te [0,A]) under Vbo andPfc. (To see this, 
observe that the likelihood for {Xo,Xa) is the conditional expectation of the likelihood 
for {Xt: t € [0, A]) and use the concavity of the logarithm and Jensen's inequality.) By 
Girsanov's theorem, the latter KuUback-Leiblcr divergence is given by 

iog^^^+ rib-bo){x^)dWs~^ r{b-bofix,)ds), 

where is a P{,g-Brownian motion. Using the stationarity of the process X under Fbg, 
we see that this equals 

A'(M6o, Mb) + y 11^-^0 II 2,A<o- 

Hence, wc find that 2KL(6o,&) < A||6 - ba\\l^^^. □ 

For any sequence of measurable sets C„ C we have that the posterior measure of 
Cn can be written as 



n(C„|Xo, . . . ,XAr. 



/p^ L„(fe)H(d6) 



X^L„(6)H(d6) ' 
where 

L - ^biXg) -pj- Pb{A,X(^i_i)A,XiA) 

TTbg{Xo) fJ^Pbo{^,X(t-i)A,XiA) 

is the likelihood ratio. Since we have the KuUback-Leibler property and our Markov chain 
satisfies a law of large numbers, the denominator in the expression for the posterior can 
be dealt with in the usual manner. This leads to the following result. 

Lemma 5.2. Suppose that for every s >0, we have H(6: KL(6o,&) < e) > 0. If for a 

collection of measurable subsets Cn C £i§ there exists some c > such that 

e"'= / L„(6)H(d6) ^0, Pbo-almost surely, (5.1) 

JCr, 
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then n(C„|Xo, . . . ,^An) 0, Pfco -a^rj^osi surely. 
Proof. By ergodicity, it Pf,,-|-a.s. holds that 



In particular, for r/ > arbitrary and b such that KL(6o,6) < r;, it Pbo-a.s. holds that 
liminf„^oo e""L„(6) > 1 for all a > ry. It follows that Pf,o-a.s., 

liminfe""/ i„(fe)n(dfe) > / liminf e""L„(6)n(d6) > n(&: KL(6o, fo) < 

ri->oo Jb:KL(bn,M<n 



and hence 

lim sup Il{C„ \Xo , . . . , X„a) < 



limsup„^^e""/^^ L„(6)n(d6) 
n(5: KL(6o,fe) <r/) 



In view of Lemma 5.1 and the fact that we can take a > arbitrarily small, this completes 
the proof. □ 

We proceed with the proof of the theorem. By definition of the topology on 3§ it suffices 
to show that Il{B\Xo, . . . ,X„a) 0, Pjj^-almost surely, where 

B = {be,'M: !,.>£}, 

with £ > and / a continuous function on M that is uniformly bounded by 1. We fix e, 
/ and the set B from this point on. 

In view of Lemma A.l the assumptions of Theorem 3.5, imply an equicontinuity prop- 
erty for the collections of functions 

{{Plf)lK. 6e^}, 

for iC C M a compact set. Arguing as in Tang and Ghosal [24], this allows us to derive 
the following useful intermediate result. 

Lemma 5.3. There exists a compact set A' C K, a positive integer N and bounded in- 
tervals /i , . . . , /jv that cover K such that 



N N 

R . 

3 



Bci[jB^ij[jB^ 



where 



B+ = {b(^B: Pif{x) - P'^'fix) > ^^Vx el,f, 
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forj^l,...,N. 

Proof. Since v is a. finite Borel measure on the line there exists a compact subset K cM. 
such that v{K^) < e/4. Let 6 > and cover K with N < oo intervals with width 6/2, 
denote the intervals by Ii,. . First, we show that B C U^i Bj, where 

Bj = !^beB: \Pif{x) P'^f{x)\ > ^^Vx G 

Suppose the inclusion is not true. Then there exists a 6 € i? such that for each j € 
{1, . . . , N} there exists a point Zj G Ij such that 

\Pifi^,)~P'A'fi^,)\<^y (5.2) 

Now 

/ \pif{x)-p'^f{x)\v{dx)+ j \pif{x)-p'^f{x)Hdx) 

< z.(i^)maxmax|Pi/(.T) -P^°/(a;)l +2||/!|ooK^^) 

< u{K) maxmax(|Pi/(z) - Pif{z,)\ + |Pi/(z,) - P''^ f{z,)\ 

+ \P'^'f{z,)-P'^°f{x)\)+e/2. 

By local uniform cquicontinuity and Lemma A.l, we can find a 6 such that the first term 
can be bounded by e/ {S^lK)). The second term can be bounded by (5.2). By continuity 
the third term can be bounded by e/{8h'{K)). Therefore, the preceding display can be 
bounded by e, contradicting that b€ B. Thus B C Uj=i Bj- 

Since the function P^/ — P^ f is continuous and Ij is connected, we have that Bj is 
included in 

jfe e B: Pifix) - Pi°fix) > Va; G /,| 

U jfe e P: Pi fix) - P'^-fix) < -^^Vx e /,| =: B+ U By. 

This completes the proof of the lemma. □ 

As a consequence of this lemma, the proof of the theorem is complete once we show 
that for j = 1, . . . , iV, 

n(p+ |Xo, . . . , x„a) -> 0, n(p- |Xo, . . . , x„a) ^ 
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Pb(,-almost surely. We give the details for the sets Bj', the argument for the sets B~ is 
completely analogous. Here, we follow the approach of [25]. We fix j G {1,...,A^} and 
consider the stochastic process D defined by 



We will show that Pjj^-almost surely, £)„ converges to exponentially fast. According to 
Lemma 5.2 this is sufficient. 

Note that since L„ is the likelihood, we have EtgDf^ = Il{B^) < oo. Next, we are inter- 
ested in the conditional expectation Ef,o(D„+i|=^„), where {^n) is the filtration generated 
by the Markov chain (X„A)n=o,i,...- Recall that the Hellinger distance h{p, q) between two 
densities p, q relative to a dominating measure ^ is defined by h^{p, q) = /(^/p— 
It satisfies h^{p,q) = 2 — 2A{p,q), where A{p,q) = J y/pqdn is the Hellinger affinity be- 
tween p and q. Then with Pn,c the random transition density 

, JcPb{A,x,y)Lr.{b)U{db) 
p^,c(^,x,y)- J^Lr^^bMdb) ' 

we have 



[J I M^l^il^^L„(&)H(d&) 




ptiA, X„, y) ^^(^)n(d6K (A, y) dy 



B+ PboiA,X„,y)^ 



PbiA, Xn , y)Ln {b)U{db)pb„ (A, X„, y) dy 



+ 



— n 4 



where An = A{p„ s+(A,^n, •),p6o(A,X„, •)). Next, we bound A„. First, note that since 

2||/||oo/i(p,'?) > 1/ f{p-q)dii\, we have h'^{p,q) > j{J f{p-q)dp.f for functions / that 
are uniformly bounded by 1. Therefore, 

A{p,q)^l-h''{p,q)<l-^-(^j f{p-q)dt?j . 
Hence, to bound An it suffices to lower bound 

J f{y) (A, Xn.y)- pb, (A, Xn,y)] dy 
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which equals 

By the definition of Bj' in Lemma 5.3, if Xn G Ij the inner integral is lower bounded by 
e/iiiyiK)). This implies that 

Hence, 

where k = l/(128i^(i^)^). We conclude that the process 

is an (^„)-supermartingale under the measure (note that M„ is bounded by the 
integrable process I?„(l — fce^)~'""^\ hence Af„ is integrable). By Doob's martingale 
convergence theorem, we have M„ — )■ 7\jfoo almost surely, for some finite- valued random 
variable Moo- By ergodicity, we have ^^"^X^iL/ ^Xteij ^ Mf>n(^i) > almost surely. An 
application of Lemma 5.2 completes the proof. 

6. Concluding remarks 

In this paper, we obtain conditions for posterior consistency of nonparametric Bayesian 
drift estimation for low-frequency observations from a scalar ergodic diffusion. The main 
theorem and the subsequent examples provide several priors for which consistency is 
guaranteed. As discussed in the Introduction, data augmentation techniques that have 
been proven to be effective in parametric settings, are in principle usable for numeri- 
cal implementation of nonparametric models as well. Preliminary investigations indicate 
that practically feasible procedures can indeed be constructed, but more work on com- 
putational issues is necessary at the moment. 

The results and proofs in this paper show that in this low-frequency observations set- 
ting, obtaining consistency relative to a rather weak topology is already quite involved. 
Very challenging but equally interesting would be the development of a testing approach 
to posterior consistency in this setting. It would allow to obtain consistency in stronger 
topologies, rates of contraction and relaxation of boundedness and equicontinuity con- 
ditions. For general diffusions this seems rather difficult, but some progress might be 
possible for diffusions on compact state spaces. 
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Appendix: An equicontinuity property of the 
transition operators 

The concept of local uniform equicontinuity is given in Definition 3.4. 

Lemma A.l. //sup^g^ ll^lloo < oo and is locally uniformly equicontinuous, then for 
every f E BC{M.) and t> 0, the collection {Ptf- b £ ^} is locally uniformly equicontin- 
uous as well. 

Proof. Let iiT C M be a compact set. For P° the law of the Brownian motion starting in 
a: we have, by Girsanov's theorem, 

P^fix) = ElfiX,)^ = E°/(X,) exp^* dX, ^ ^ b^{X.,) ds^ . 

Under the process X has the same law as a; + W, for W a standard Brownian motion 
starting in 0. Hence, we get 

Pt'f{x)=Efix + Wt)L,, 

where 

T, — e'" / 

It follows that 

\P;^fix)-P;^fiy)\ < E\f{x + Wt)L,-fiy + Wt)Ly\ 

< E\f{x + Wt)\\L,-Ly\+E\Ly\\f{x + Wt)^f{y + Wt)\ 
=:I + II. 

We first bound the term /. By the fact that |e'' - e''] <\a- 6|(e" + e'') and Cauchy- 
Schwarz, 

\lf<\\f\\liE\L,-Ly\)' 

<\\f\Lmi.^ly\\L. + Ly\r 
<\\f\\lE\l,~ly\'EiL,,+Ly)'. 

We have 

L-ly^ f ib{x + Ws)~biy + W,))dWs~l- f {b\x + Ws)-b''iy + Ws))ds. (A.l) 
Jo ^ Jo 



f b{u + Ws)dW,~l- f 

Jo ^ JO 



b^{u + Ws)ds. 
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For the first term on the right tlie Ito isometry gives, for x,y € K, 
e( f {b{x + Ws)-b{y + W,))dW,^ 



\J0 

= E / {b{x + Ws) - b{y + Ws)f ds 
Jo 

<E f {b{x + Ws)-b{y + Ws)fh^^^^^\w,\<Mds + 4t\\b\\l,F( sup \Ws\>M 

Jo ~ ^s<t 

<t sup \b{u)^b{v)\^ +'it\\b\\lj'(siip\Ws\> m) 

\u-v\<\x-y\ 

for every M > 0, where X' = {x + y: x & K,y & [—M^ ^^W- The assumptions on ^ imply 
that by choosing M large enough and |x — y| small enough, the right-hand side can be 
made arbitrarily small, uniformly in 3§. The second term on the right of (A.l) can be 
handled in the same manner, using also the fact that |6^(m) — < 2||6||oo|&(u) — ^(^^)|- 

To complete the bound for term /, we note that E(Lj; + LyY < 2EL^ + 2EL^ and we 
write 



Ll=cxp(^j^ 2b{u + Ws)dWs- b^{u + Ws)ds 

= exp^^ b^{u + Ws)ds^cxp(^J^ 2b{u + Ws)dWs-^ 



t 

2 



{2by{u + Ws)ds 



The first factor on the right is bounded by exp(i||6||^) and the second one is the time t 
value of a martingale that starts in 1 . Hence, 

EL^ <c*IIMlL. 

Finally, observe that by Cauchy-Schwarz and a bound derived above, 

|//|2 < etm"^E\f{x + Wt) - f{y + Wt)\^ 
This completes the proof. □ 
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